Using Sub-sequence Information with kNN for Classification of Sequential Data
نویسندگان
چکیده
With the enormous growth of data, which exhibit sequentiality, it has become important to investigate the impact of embedded sequential information within the data. Sequential data are growing enormously, hence an efficient classification of sequential data is needed. k-Nearest Neighbor (kNN) has been used and proved to be an efficient classification technique for two-class problems. This paper uses sliding window approach to extract sub-sequences of various lengths and classification using kNN. We conducted experiments on DARPA 98 IDS dataset using various distance/similarity measures such as Jaccard similarity, Cosine similarity, Euclidian distance and Binary Weighted Cosine (BWC) measure. Our results demonstrate that sub-sequence information enhances kNN classification accuracy for sequential data, irrespective of the distance/similarity metric used.
منابع مشابه
Detection of some Tree Species from Terrestrial Laser Scanner Point Cloud Data Using Support-vector Machine and Nearest Neighborhood Algorithms
acquisition field reference data using conventional methods due to limited and time-consuming data from a single tree in recent years, to generate reference data for forest studies using terrestrial laser scanner data, aerial laser scanner data, radar and Optics has become commonplace, and complete, accurate 3D data from a single tree or reference trees can be recorded. The detection and identi...
متن کاملAn Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification
The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...
متن کاملAn Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification
The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...
متن کاملAdvances in Classification of Sequence Data
In recent years, advanced information systems have enabled a collection of increasingly large amounts of data that are sequential in nature. To analyze huge amounts of sequential data, the interdisciplinary field of knowledge discovery in databases (KDD) is very useful. The most important step within the process of KDD is data mining, which is concerned with the extraction of the valid patterns...
متن کاملA Novel Scheme for Improving Accuracy of KNN Classification Algorithm Based on the New Weighting Technique and Stepwise Feature Selection
K nearest neighbor algorithm is one of the most frequently used techniques in data mining for its integrity and performance. Though the KNN algorithm is highly effective in many cases, it has some essential deficiencies, which affects the classification accuracy of the algorithm. First, the effectiveness of the algorithm is affected by redundant and irrelevant features. Furthermore, this algori...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005